The AMI Meeting Corpus
نویسنده
چکیده
To support multi-disciplinary research in the AMI (Augmented Multi-party Interaction) project, a 100 hour corpus of meetings is being collected. This corpus is being recorded in several instrumented rooms equipped with a variety of microphones, video cameras, electronic pens, presentation slide capture and white-board capture devices. As well as real meetings, the corpus contains a significant proportion of scenario-driven meetings, which have been designed to elicit a rich range of realistic behaviors. To facilitate research, the raw data are being annotated at a number of levels including speech transcriptions, dialogue acts and summaries. The corpus is being distributed using a web server designed to allow convenient browsing and download of multimedia content and associated annotations. This article first overviews AMI research themes, then discusses corpus design, as well as data collection, annotation and distribution.
منابع مشابه
Combining Multiple Information Layers for the Automatic Generation of Indicative Meeting Abstracts
We describe a new application for NLG technology: the generation of indicative, abstractive summaries of multi-party meetings. Based on the freely available AMI corpus of 100 hours of recorded meetings, we are developing a summarizer that uses the rich annotations in the AMI corpus.
متن کاملGenerating Usable Formats for Metadata and Annotations in a Large Meeting Corpus
The AMI Meeting Corpus is now publicly available, including manual annotation files generated in the NXT XML format, but lacking explicit metadata for the 171 meetings of the corpus. To increase the usability of this important resource, a representation format based on relational databases is proposed, which maximizes informativeness, simplicity and reusability of the metadata and annotations. ...
متن کاملUnleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus
Creating the AMI Meeting Corpus was an ambitious endeavour, probably more ambitious than the people who first thought of it realize even now. It contains 100 hours of meetings captured using a whole host of synchronized recording devices, and is designed to support work in speech and video processing, language engineering, corpus linguistics, and organizational psychology. It has been transcrib...
متن کاملA Multimodal Corpus for Studying Dominance in Small Group Conversations
We present a new multimodal corpus with dominance annotations on small group conversations. We used five-minute non-overlapping slices from a subset of meetings selected from the popular Augmented Multi-party Interaction (AMI) corpus. The total length of the annotated corpus corresponds to 10 hours of meeting data. Each meeting is observed and assessed by three annotators according to their lev...
متن کاملThe AMI Meeting Corpus: A Pre-announcement
The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings. It is being created in the context of a project that is developing meeting browsing technology and will eventually be released publicly. Some of the meetings it contains are naturally occurring, and some are elicited, particularly using a scenario in which the participants play different roles in a d...
متن کامل